Problem Statement

The California Department of Public Health detected a novel infectious respiratory disease outbreak in California between May to December 2023, and collected information about the number of cases and case severity, along with demographic information on infected individuals. This report aims to examine the course of this outbreak and understand if it disproportionately affected certain demographic or geographic populations. In particular, race/ethnicity, county, and age factors are examined to identify populations who may benefit most from prevention and treatment resources.

Methods

Dataset 1: Simulated Novel Infection Disease case reporting for California

This data source is simulated data of weekly infectious disease cases for each county in California reported from public health agencies and organizations such as county health departments during 2023, beginning in late May 2023 until the end of December 2023. The infection data are linked with demographic information such as age group, binary gender, race and ethnicity.

Cleaning

  1. Column names into snake case and renamed to match with other sources.
  2. Diagnosis dates read in dmy format using lubridate.
  3. Race/ethnicity categories recoded with human readable values to match the LA county dataset.
  4. Time interval data recoded into epiweeks.
  5. Cleaned up county names by removing the word “county” from the value in each row.

Dataset 2: Simulated Novel Infectious Disease case reporting for Los Angeles County

This is a simulated dataset containing reported weekly cases of a disease categorized by date of diagnoses, patient demographics, and cumulative totals for infected, unrecovered, and severe cases, for the county of Los Angeles. Such data would have been collected by public health agencies from around LA county. Data was collected from late May 2023 until the end of December 2023. The data provides patient demographics of age group, binary gender, race and ethnicity which will help to answer if there are disparities in the rate of infection among different populations.

Cleaning

  1. Column names into snake case and renamed to match with other sources
  2. Diagnosis dates read in dmy format using lubridate.
  3. Epi-week column created based off the infection dates.
  4. Created a “county” column and populated with value “Los Angeles” to match with the state wide infection data source.

Dataset 3: California estimated population for 2023

This dataset is simulated, but approximates what population data from the State of California might look like. It includes population estimates by the CA Dept of Finance for 2023 by CA county and demographic categories (age, race, and sex).

Cleaning

  1. Column names into snake case and renamed to match with other sources
  2. Race/ethnicity categories recoded to match the LA county dataset.
  3. First 3 age categories combined because the other two data sets are 0-17 and this one is broken down into 0-4, 5-11, and 12-17.
  4. Removed health officer region data.

Analytic methods

First, the two disease infection datasets were joined together (Source 1 & 2) simply by binding the rows (added the rows from both datasets together). This generated a joined list of infection data for all counties in California.

Next, strata of interest were identified. Since we’re interested in the distribution of infections across race and geographic categories, the rows were grouped by county and then by race/ethnicity. Counts of the infections in each stratum were then summed up. Since both datasets have information on both 1. new infections and 2. new severe infections in separate columns, a sum for each of these two columns was obtained per stratum.

This stratified summary of weekly new and severe case counts was left joined with the California population dataset, using county and race_ethnicity categories as keys. The resulting table shows the counts of weekly new infections, new severe infections, and total population count for each stratum.

To calculate the ra

Results

Map of weekly new infections by county

Table of Weekly New Infections per 100 People per Race and Ethnic Group in each County